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1. A method for computer-aided determination of a 
sequence of actions for a system which has spates, a 
transition in state between two states being/performed 
on the basis of an action, in the case off which the 
determination of the sequence of actions As performed 
in such a way that a sequence of states Resulting from 
the sequence of actions is optimized wiith regard to a 
prescribed optimization function, ifhe optimization 
function including a variable parameter with the aid of 
which it is possible to set a risk^which the resulting 
sequence of states has with respect to a prescribed 
state of the system. f\ / 

2. The method as i claimed/ in claim 1, in which a 
method of approximative dyhamic programming is used for 



The method as ^laim^d in claim 2, in which the 



the purpose of determinatiq 
3. 

method of approximativey<iynaihic programming is a method 
based on Q-learning. 
4. The method as/ claimed\in claim 3, in which the 

optimization function OFQ is Yormed within Q-learning 
in accordance with /the following rule: 

v 



OFQ = q(x; w a ) , t 



x denoting a state in a state space X 

a denoting an action from an action space A, and 



w a denoting the weights of a function approximator 
which/belong to the action a, 
30 and in wliich the weights of the function approximator 
are adapted in accordance with the following rule: 



GR 98 P 2663 

- 30 - / 

w ? + l = «? t + Ht • « K (dt) • VCj(x t ; wjt-f 

with the abbreviation / 

d t = r(x t , a t/ x t + i) + y max o(xt+;i, wf ) - o(x t , wj fc ) 

aeA ^ 

• x t/ x t +l respectively denoting a state in the state 
space X, i 

• a t denoting an action from an action space A, 

• y denoting a prescribable reduction factor, 

• w^* denoting the weighting vector associated with 

the action a t before ithe adaptation step, 



• wft, denoWWig the weighing vector associated with 

t"rl \ \ a 

the actiorA aXfter/ the adaptation step, 

• T| t (t = 1, y . -\ d/noting a prescribable step size 
sequence, \ V? 

• k e [-1; 1] OenofJmg a risk monitoring parameter, 

• N K denoting A £isk monitoring function K K (^) = 
(1 - icsign<£))& \ 

• VQ(-;*) denoting thev derivation of the function 
approximator ,£ccoo:ding\ to its weights, and 

• r(x t/ a t/ x t+ i) denoting^a gain upon the transition 

of state frpm the sstate x t to the subsequent state 

t 

x t +i • / 

5. The method as claimed in claim 2, in which the 

/■ 

method of approximative dynamic programming is a method 

a 

based on TD (A/) -learning . 

6. The /method as claimed in claim 5, in which the 
optimization function OFTD is formed within TD (A,) - 
learning i/ accordance with the following rule: 

/ 



UJ 

ru 

U 
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OFTD = J(x;w) 
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• x denoting a state in a state space X, 

• a denoting an action from an action space A, and 

• w denoting the weights of a function approximator 
and in which the weights of the/ function approximator 
are adapted in accordance with tjle following rule: 

w t +i = w t + r|t • tt K (d t ) • z t 
with the abbreviations 

d t = r(w t/ a t , x t+i ) + yJ(x t+1 ;/w t ) - J(x t ; w t ) , 



15 z t = X ' y ' z t -i + VJ(x t ; w/) , 



Z-x = 0 



20 



25 




ive/y denoting a state in the state 



.on from an action space A, 
)able reduction factor, 
sighting vector before 



the 
the 



ighting vector after 
oting a prescribable step size 



30 • 



35 



x t/ x t +i re 
space X, 
a t denoting 
y denoting a 
w t denoting 
adaptation st 
w t+1 denoting 
adaptation step, 
Tit (t = 1,/. . .) 
sequence, 

k e [-l,y 1] denoting a risk monitoring parameter, 
t< K denoting a risk monitoring function X* (§) = 
(1 - Ksignt^K, 

VJ(-;y> denoting the derivation of the function 
approximator according to its weights, and 
r(x t ,/ a t , x t +i) denoting a gain upon the transition 
of s^tate from the state x t to the subsequent state 
x t+iJ 
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7. The method as claimed in one df claims 1 to 6, 
in which the system is a technical/ system of which 
before the determination measured values are measured 
which are used in determining the sequence of actions. 

8. The method as claimed in claim 7, in which the 
technical system is subjected td open-loop control in 
accordance with the sequence of ^actions. 

9. The method as claimed /n claim 7, in which the 
technical system is ^ub jected/to closed-loop control in 
accordance with the ^eqyenceyof actions. 

10. The method ak cl^imid in one of claims 1 to 9, 

/ 

modeled as a Markov decision 



in which the system 
problem. 

11. The method as 
being used in a traffi 

12. The method as/c 
being used m a communic 




in one of claims 1 to 10, 
ent system. 

one of claims 1 to 10, 
stem. 



13. The method ^ks claimed in one of claims 1 to 10, 
being used to ^/carry \out access control in a 
communications network . 

14. The method as claiAed in one of claims 1 to 10, 
being used to ycarry out a \routing in a communications 
network. 

15. An arrangement for determining a sequence of 
actions foy a system which has states, a transition in 
state between two states being performed on the basis 
of an action, 
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having a processor which is set/ up in such a way that 
the determination of the sequence of actions can be 
performed in such a way thfat a sequence of states 
resulting from the sequence of actions is optimized 
with regard to a prescribed optimization function, the 
optimization function including a variable parameter 
with the aid of which /it is possible to set a risk 
which the resulting se/uence of states has with respect 



> ject 



to a prescribed state 

16. Th 
used to 
control . 

17. The 
used to s 
control . 

18. The 
used in a tr 

19. The, 



of the system, 
rrangeiyfent as claimed in claim 15, being 
a technical system to open-loop 



arrangement as claimed in claim 15, being 
a technical system to closed-loop 



>je< 



:ran 



:ic 



arc 



ment as claimed in claim 15, being 
nagement system, 
angement as claimed in claim 15, being 
used in a yfcommunications system. 

20. The arrangement as claimed in claim 15, being 
used tor carry out access control in a communications 
networ/ 

21. / The arrangement as claimed in claim 15, being 
used/ to carry out a routing in a communications 
netjfrorki. 




